nd608 - Project Personalized Real Estate Agent¶

In [ ]:
# Load environment variables from a .env file. Alternatively you can
# manually set the value of OPENAI_API_KEY on this cell.

from io import BytesIO
from os import environ
from pathlib import Path

try:
    from dotenv import load_dotenv
    load_dotenv()
except ModuleNotFoundError:
    pass

if "OPENAI_API_KEY" not in environ:
    environ["OPENAI_API_KEY"] = "your-openai-api-key"
In [ ]:
import pickle

from textwrap import dedent

import lancedb
import openai
import requests
import torch

from IPython.display import display, Markdown
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from PIL import Image
from pydantic import BaseModel, Field, NonNegativeFloat, NonNegativeInt
from transformers import AutoTokenizer, CLIPProcessor, CLIPModel

from models import RealEstateListingLanceRecord
In [ ]:
data_dir = Path("data")
data_dir.mkdir(exist_ok=True)
In [ ]:
device = "cpu"

if torch.cuda.is_available():
    device = "cpu"
elif torch.backends.mps.is_available():
    device = "mps"

Generating Real Estate Listings¶

The purpose of this document is to generate synthetic real estate listings using OpenAI's generative AI APIs. We'll also create a LanceDB attaching embeddings to the generated content.

We'll use LangChain's PromptTemplate, PydanticOutputParser to generate the synthetic real estate listings in a structured format to make it easier to store the information on a table. We'll use the format suggested on the project's instruction:

Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.
In [ ]:
class RealEstateListingModelOutput(BaseModel):
    neighborhood: str = Field(description="Name of the neighborhod")
    price: NonNegativeInt = Field(description="List price of the property")
    bedrooms: NonNegativeInt = Field(description="Number of bedrooms of the property")
    bathrooms: NonNegativeFloat | NonNegativeInt = Field(description="Number of bathroom of the property")
    description: str = Field(description="Description of the property")
    neighborhood_description: str = Field(description="Description of the neighborhood")


class RealEstateListingsModelOutput(BaseModel):
    listings: list[RealEstateListingModelOutput]


class RealEstateListing(RealEstateListingModelOutput):
    image_bytes: bytes | None = Field(description="Contents of the generated image", default=None)
    
    @property
    def image_as_pil(self):
        return Image.open(BytesIO(self.image_bytes))
In [ ]:
parser = PydanticOutputParser(pydantic_object=RealEstateListingsModelOutput)
print(parser.get_format_instructions())
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"RealEstateListingModelOutput": {"properties": {"neighborhood": {"description": "Name of the neighborhod", "title": "Neighborhood", "type": "string"}, "price": {"description": "List price of the property", "minimum": 0, "title": "Price", "type": "integer"}, "bedrooms": {"description": "Number of bedrooms of the property", "minimum": 0, "title": "Bedrooms", "type": "integer"}, "bathrooms": {"anyOf": [{"minimum": 0.0, "type": "number"}, {"minimum": 0, "type": "integer"}], "description": "Number of bathroom of the property", "title": "Bathrooms"}, "description": {"description": "Description of the property", "title": "Description", "type": "string"}, "neighborhood_description": {"description": "Description of the neighborhood", "title": "Neighborhood Description", "type": "string"}}, "required": ["neighborhood", "price", "bedrooms", "bathrooms", "description", "neighborhood_description"], "title": "RealEstateListingModelOutput", "type": "object"}}, "properties": {"listings": {"items": {"$ref": "#/$defs/RealEstateListingModelOutput"}, "title": "Listings", "type": "array"}}, "required": ["listings"]}
```
In [ ]:
prompt = PromptTemplate(
    template=dedent("""\
        You are a writer and a real estate expert with extensive
        knowledge of the terminolgy and a capable of writing lengthy,
        easy to read and factual descriptions of properties.

        Generate {num_listings} listings of imaginary real estate
        properties. The description of the property should include detailed
        mentions of the property's features like the number of bedrooms and
        bathrooms. The description of the property should include details
        about the exterior as well. The description of the property should
        contain 3 sentences. Include both upper-middle class and lower
        income neighborhoods. The neighborhood income level should be
        consistent with the property. Include a few fixer-upper properties.
    """) + "\n{format_instructions}",
    input_variables=["request"],
    partial_variables={
        "format_instructions": parser.get_format_instructions
    },
)
In [ ]:
print(prompt.format(num_listings=15))
You are a writer and a real estate expert with extensive
knowledge of the terminolgy and a capable of writing lengthy,
easy to read and factual descriptions of properties.

Generate 15 listings of imaginary real estate
properties. The description of the property should include detailed
mentions of the property's features like the number of bedrooms and
bathrooms. The description of the property should include details
about the exterior as well. The description of the property should
contain 3 sentences. Include both upper-middle class and lower
income neighborhoods. The neighborhood income level should be
consistent with the property. Include a few fixer-upper properties.

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"RealEstateListingModelOutput": {"properties": {"neighborhood": {"description": "Name of the neighborhod", "title": "Neighborhood", "type": "string"}, "price": {"description": "List price of the property", "minimum": 0, "title": "Price", "type": "integer"}, "bedrooms": {"description": "Number of bedrooms of the property", "minimum": 0, "title": "Bedrooms", "type": "integer"}, "bathrooms": {"anyOf": [{"minimum": 0.0, "type": "number"}, {"minimum": 0, "type": "integer"}], "description": "Number of bathroom of the property", "title": "Bathrooms"}, "description": {"description": "Description of the property", "title": "Description", "type": "string"}, "neighborhood_description": {"description": "Description of the neighborhood", "title": "Neighborhood Description", "type": "string"}}, "required": ["neighborhood", "price", "bedrooms", "bathrooms", "description", "neighborhood_description"], "title": "RealEstateListingModelOutput", "type": "object"}}, "properties": {"listings": {"items": {"$ref": "#/$defs/RealEstateListingModelOutput"}, "title": "Listings", "type": "array"}}, "required": ["listings"]}
```

We'll use OpenAI's gpt-4-turbo model as it has higher chances of following the instructions.

In [ ]:
llm = ChatOpenAI(
    model_name="gpt-4-turbo",
    temperature=0.2,  # Sacrificing reproducibility to give the model some leeway
    max_tokens=4000
)

We pipe the model's completion to the LangChain output parser to generate the Pydantic models of the listings.

In [ ]:
parsed_model_response = (llm | parser).invoke(prompt.format(num_listings=15))
parsed_model_response.listings
Out[ ]:
[RealEstateListingModelOutput(neighborhood='Maplewood Heights', price=450000, bedrooms=4, bathrooms=3, description='Charming two-story home featuring a spacious kitchen with modern appliances, a cozy living room with a fireplace, and a large backyard with a deck. The master suite offers a walk-in closet and a luxurious bathroom with a soaking tub. Located in a family-friendly neighborhood with excellent schools and parks nearby.', neighborhood_description='Maplewood Heights is known for its tree-lined streets, active community association, and well-maintained parks.'),
 RealEstateListingModelOutput(neighborhood='Elm Grove', price=320000, bedrooms=3, bathrooms=2, description='This single-story home boasts an open floor plan with a large living area, a functional kitchen, and hardwood floors throughout. The property includes a private, fenced backyard and a two-car garage. Elm Grove is a peaceful neighborhood with convenient access to shopping and public transportation.', neighborhood_description='Elm Grove offers a quiet suburban lifestyle with easy access to urban amenities.'),
 RealEstateListingModelOutput(neighborhood='Cedar Park', price=275000, bedrooms=3, bathrooms=1.5, description='Ideal starter home needing some TLC, featuring a compact kitchen, a comfortable living room, and a sizable backyard. This fixer-upper is a great opportunity for buyers looking to customize their home. Cedar Park is an up-and-coming neighborhood with a lot of potential for growth.', neighborhood_description='Cedar Park is a lower-income area with a strong sense of community and ongoing development projects.'),
 RealEstateListingModelOutput(neighborhood='Pine Ridge', price=600000, bedrooms=5, bathrooms=4, description='Luxurious estate with an expansive garden, gourmet kitchen with state-of-the-art appliances, and a master suite with a private balcony overlooking the grounds. The home includes a home theater and a wine cellar. Pine Ridge is an exclusive neighborhood known for its high-end properties and excellent security.', neighborhood_description='Pine Ridge is a prestigious area with large, well-appointed homes and meticulously landscaped gardens.'),
 RealEstateListingModelOutput(neighborhood='Willow Creek', price=200000, bedrooms=2, bathrooms=1, description='Cozy bungalow perfect for first-time homeowners, featuring an updated kitchen, a bright living room, and a small, manageable yard. This home is move-in ready and offers a comfortable living space at an affordable price. Willow Creek is a friendly neighborhood with a mix of older and newly renovated homes.', neighborhood_description='Willow Creek is a diverse community with affordable housing options and a welcoming atmosphere.'),
 RealEstateListingModelOutput(neighborhood='Sunnyvale', price=750000, bedrooms=4, bathrooms=3.5, description='Stunning contemporary home with high ceilings, floor-to-ceiling windows, and a sleek, minimalist design. The property features an outdoor pool and a spacious patio for entertaining. Sunnyvale is a vibrant neighborhood with a focus on sustainability and modern living.', neighborhood_description='Sunnyvale is a modern neighborhood with eco-friendly homes and a lively community.'),
 RealEstateListingModelOutput(neighborhood='Oakwood Estates', price=550000, bedrooms=4, bathrooms=2.5, description='Beautifully maintained colonial-style home with a formal dining room, a library, and a large kitchen with an island. The home sits on a half-acre lot with mature trees and a private garden. Oakwood Estates is a quiet, upscale neighborhood with excellent schools and community facilities.', neighborhood_description='Oakwood Estates is known for its traditional architecture and spacious, elegant homes.'),
 RealEstateListingModelOutput(neighborhood='Riverbend', price=225000, bedrooms=3, bathrooms=2, description='Affordable family home with recent updates, including new roofing and HVAC system. The home features a functional layout with a spacious living room and a kitchen with ample storage. Riverbend is a growing neighborhood with a mix of residential and commercial developments.', neighborhood_description='Riverbend is a revitalizing area with affordable housing and new local businesses.'),
 RealEstateListingModelOutput(neighborhood='Lakeside Village', price=490000, bedrooms=3, bathrooms=2.5, description='Waterfront property with breathtaking views of the lake, featuring a modern kitchen, a sunroom, and a private dock. This home is perfect for those who enjoy outdoor activities and a peaceful setting. Lakeside Village is a serene neighborhood with a strong community spirit and outdoor recreational opportunities.', neighborhood_description='Lakeside Village is a tranquil area with homes overlooking the water and a variety of water-based activities.'),
 RealEstateListingModelOutput(neighborhood='Highland Park', price=650000, bedrooms=5, bathrooms=3.5, description='Spacious family home with a large foyer, an updated kitchen with granite countertops, and a master suite with dual vanities and a jetted tub. The property includes a landscaped yard and a three-car garage. Highland Park is known for its safe streets and luxurious homes.', neighborhood_description='Highland Park is a prestigious neighborhood with well-maintained properties and active community engagement.'),
 RealEstateListingModelOutput(neighborhood='Brookside', price=180000, bedrooms=2, bathrooms=1, description='Compact and affordable fixer-upper with great potential, needing updates and repairs. Ideal for investors or first-time buyers willing to put in some work. Brookside is a lower-income neighborhood with a real sense of community and convenient access to local amenities.', neighborhood_description='Brookside is an economically diverse area with a mix of housing styles and community-driven improvements.'),
 RealEstateListingModelOutput(neighborhood='Silverleaf', price=530000, bedrooms=4, bathrooms=3, description="Elegant home in a gated community, featuring high ceilings, a chef's kitchen, and a covered patio perfect for entertaining. The master bedroom includes a large en-suite bathroom with a walk-in shower and a soaking tub. Silverleaf is a secure and private neighborhood with luxurious amenities and beautifully landscaped common areas.", neighborhood_description='Silverleaf is an exclusive community known for its security and high-quality living standards.'),
 RealEstateListingModelOutput(neighborhood='Meadowview', price=210000, bedrooms=3, bathrooms=1.5, description='This charming fixer-upper offers a unique opportunity for customization, with a spacious layout and original hardwood floors. The home is situated on a large lot with room for expansion. Meadowview is an affordable neighborhood with a friendly community and easy access to downtown.', neighborhood_description='Meadowview is a budget-friendly neighborhood with potential for property value growth.'),
 RealEstateListingModelOutput(neighborhood='Evergreen Terrace', price=310000, bedrooms=3, bathrooms=2, description='Well-maintained ranch-style home with a modern kitchen, a cozy fireplace in the living room, and a large deck in the backyard. The home is energy-efficient with solar panels and a new water heater. Evergreen Terrace is a quiet neighborhood with green spaces and a close-knit community.', neighborhood_description='Evergreen Terrace is known for its sustainability initiatives and strong community bonds.'),
 RealEstateListingModelOutput(neighborhood='Grandview Heights', price=470000, bedrooms=4, bathrooms=2.5, description='Modern home with an open floor plan, featuring a state-of-the-art kitchen, a spacious living area, and large windows that provide plenty of natural light. The home includes a beautifully landscaped garden and a two-car garage. Grandview Heights is a vibrant neighborhood with a variety of shops, restaurants, and cultural attractions.', neighborhood_description='Grandview Heights is a lively and upscale neighborhood with a rich cultural scene and excellent amenities.')]

Let's save the generated real estate listings to avoid hitting the model multiple times.

In [ ]:
with open(data_dir / "listings.pickle", "wb") as f:
    pickle.dump(parsed_model_response.listings, f)

Read the listings back.

In [ ]:
with open(data_dir / "listings.pickle", "rb") as f:
    listings = pickle.load(f)

Generating Real Estate Listings Images¶

We want to increase the usability of our recommendation app, so we'll use OpenAI's DALL-e. We're adding specific hints to the prompt to generate photorealistic images.

In [ ]:
client = openai.OpenAI()
In [ ]:
listings_with_image = []

for i, listing in enumerate(listings):
    display(Markdown(f"Generating image for listing with description: _'{listing.description}'_...."))

    dalle2_response = client.images.generate(
        model="dall-e-2",
        prompt=f"Photo of {listing.description}. 1/100s, ISO 100, Daylight.",
        size="512x512",
        quality="standard",
        n=1,
    )

    response = requests.get(dalle2_response.data[0].url)
    response.raise_for_status()

    listings_with_image.append(
        RealEstateListing(
            **listing.model_dump(),
            image_bytes=response.content,
        )
    )

    image = Image.open(BytesIO(response.content))

    display(image)

Generating image for listing with description: 'Charming two-story home featuring a spacious kitchen with modern appliances, a cozy living room with a fireplace, and a large backyard with a deck. The master suite offers a walk-in closet and a luxurious bathroom with a soaking tub. Located in a family-friendly neighborhood with excellent schools and parks nearby.'....

No description has been provided for this image

Generating image for listing with description: 'This single-story home boasts an open floor plan with a large living area, a functional kitchen, and hardwood floors throughout. The property includes a private, fenced backyard and a two-car garage. Elm Grove is a peaceful neighborhood with convenient access to shopping and public transportation.'....

No description has been provided for this image

Generating image for listing with description: 'Ideal starter home needing some TLC, featuring a compact kitchen, a comfortable living room, and a sizable backyard. This fixer-upper is a great opportunity for buyers looking to customize their home. Cedar Park is an up-and-coming neighborhood with a lot of potential for growth.'....

No description has been provided for this image

Generating image for listing with description: 'Luxurious estate with an expansive garden, gourmet kitchen with state-of-the-art appliances, and a master suite with a private balcony overlooking the grounds. The home includes a home theater and a wine cellar. Pine Ridge is an exclusive neighborhood known for its high-end properties and excellent security.'....

No description has been provided for this image

Generating image for listing with description: 'Cozy bungalow perfect for first-time homeowners, featuring an updated kitchen, a bright living room, and a small, manageable yard. This home is move-in ready and offers a comfortable living space at an affordable price. Willow Creek is a friendly neighborhood with a mix of older and newly renovated homes.'....

No description has been provided for this image

Generating image for listing with description: 'Stunning contemporary home with high ceilings, floor-to-ceiling windows, and a sleek, minimalist design. The property features an outdoor pool and a spacious patio for entertaining. Sunnyvale is a vibrant neighborhood with a focus on sustainability and modern living.'....

No description has been provided for this image

Generating image for listing with description: 'Beautifully maintained colonial-style home with a formal dining room, a library, and a large kitchen with an island. The home sits on a half-acre lot with mature trees and a private garden. Oakwood Estates is a quiet, upscale neighborhood with excellent schools and community facilities.'....

No description has been provided for this image

Generating image for listing with description: 'Affordable family home with recent updates, including new roofing and HVAC system. The home features a functional layout with a spacious living room and a kitchen with ample storage. Riverbend is a growing neighborhood with a mix of residential and commercial developments.'....

No description has been provided for this image

Generating image for listing with description: 'Waterfront property with breathtaking views of the lake, featuring a modern kitchen, a sunroom, and a private dock. This home is perfect for those who enjoy outdoor activities and a peaceful setting. Lakeside Village is a serene neighborhood with a strong community spirit and outdoor recreational opportunities.'....

No description has been provided for this image

Generating image for listing with description: 'Spacious family home with a large foyer, an updated kitchen with granite countertops, and a master suite with dual vanities and a jetted tub. The property includes a landscaped yard and a three-car garage. Highland Park is known for its safe streets and luxurious homes.'....

No description has been provided for this image

Generating image for listing with description: 'Compact and affordable fixer-upper with great potential, needing updates and repairs. Ideal for investors or first-time buyers willing to put in some work. Brookside is a lower-income neighborhood with a real sense of community and convenient access to local amenities.'....

No description has been provided for this image

Generating image for listing with description: 'Elegant home in a gated community, featuring high ceilings, a chef's kitchen, and a covered patio perfect for entertaining. The master bedroom includes a large en-suite bathroom with a walk-in shower and a soaking tub. Silverleaf is a secure and private neighborhood with luxurious amenities and beautifully landscaped common areas.'....

No description has been provided for this image

Generating image for listing with description: 'This charming fixer-upper offers a unique opportunity for customization, with a spacious layout and original hardwood floors. The home is situated on a large lot with room for expansion. Meadowview is an affordable neighborhood with a friendly community and easy access to downtown.'....

No description has been provided for this image

Generating image for listing with description: 'Well-maintained ranch-style home with a modern kitchen, a cozy fireplace in the living room, and a large deck in the backyard. The home is energy-efficient with solar panels and a new water heater. Evergreen Terrace is a quiet neighborhood with green spaces and a close-knit community.'....

No description has been provided for this image

Generating image for listing with description: 'Modern home with an open floor plan, featuring a state-of-the-art kitchen, a spacious living area, and large windows that provide plenty of natural light. The home includes a beautifully landscaped garden and a two-car garage. Grandview Heights is a vibrant neighborhood with a variety of shops, restaurants, and cultural attractions.'....

No description has been provided for this image

We save the results one more time, to avoid hitting the model multiple times.

In [ ]:
with open(data_dir / "listings_with_image.pickle", "wb") as f:
    pickle.dump(listings_with_image, f)

Read the listings back

In [ ]:
with open(data_dir / "listings_with_image.pickle", "rb") as f:
    listings_with_image = pickle.load(f)

Storing Listings in a Vector Database¶

Vector Database Setup¶

In [ ]:
table_name = "real_estate_listings"
In [ ]:
db = lancedb.connect(data_dir / "lancedb")
db.drop_table(table_name, ignore_missing=True)

We create the listings table using the RealEstateListingLanceRecord schema from the models module.

In [ ]:
table = db.create_table(table_name, schema=RealEstateListingLanceRecord)

Generating and Storing Embeddings¶

We're going to use HuggingFace's CLIP models to generate embeddings for the listing and image combination.

In [ ]:
clip_model = "openai/clip-vit-large-patch14"

model = CLIPModel.from_pretrained(clip_model).to(device)
processor = CLIPProcessor.from_pretrained(clip_model)
tokenizer = AutoTokenizer.from_pretrained(clip_model)
In [ ]:
def get_listing_embeddings(listing: RealEstateListing) -> torch.Tensor:
    processor_output = processor(
        text=[listing.description + listing.neighborhood_description],
        images=listing.image_as_pil,
        return_tensors="pt",
        padding=True,
        truncation=True
    )
    image = processor_output["pixel_values"].to(device)
    image_embeddings = model.get_image_features(image)

    return image_embeddings[0].cpu()
In [ ]:
table.add([
    RealEstateListingLanceRecord(
        **listing.model_dump(),
        vector=get_listing_embeddings(listing).detach().numpy()
    )
    for listing in listings_with_image
])

Querying the Vector Database¶

In [ ]:
inputs = tokenizer("Affordable house with backyard", padding=True, truncation=True, return_tensors="pt").to(device)
text_features = model.get_text_features(**inputs)[0].cpu().detach().numpy()
In [ ]:
Markdown(
    "\n".join(
        f"* _{listing.description}_" for listing in table.search(text_features).limit(3).to_pydantic(RealEstateListingLanceRecord)
    )
)
Out[ ]:
  • Affordable family home with recent updates, including new roofing and HVAC system. The home features a functional layout with a spacious living room and a kitchen with ample storage. Riverbend is a growing neighborhood with a mix of residential and commercial developments.
  • Cozy bungalow perfect for first-time homeowners, featuring an updated kitchen, a bright living room, and a small, manageable yard. This home is move-in ready and offers a comfortable living space at an affordable price. Willow Creek is a friendly neighborhood with a mix of older and newly renovated homes.
  • Well-maintained ranch-style home with a modern kitchen, a cozy fireplace in the living room, and a large deck in the backyard. The home is energy-efficient with solar panels and a new water heater. Evergreen Terrace is a quiet neighborhood with green spaces and a close-knit community.